The Application of a Simulated Annealing Fuzzy Clustering Algorithm for Cancer Diagnosis
نویسندگان
چکیده
Fourier Transform Infrared Spectroscopy (FTIR) is becoming a powerful tool for use in the study of biomedical conditions, including cancer diagnosis. As part of an ongoing programme of research into the potential early diagnosis of cervical cancer, Hierarchical Cluster Analysis (HCA) and Fuzzy C-Means (FCM) have been applied to distinguish FTIR spectra obtained from cancerous and non-cancerous cells. In recent experimentation on non pre-processed FTIR spectra data, the FCM method has been shown to achieve significantly better results. Nevertheless, two limitations apply to this technique. Firstly, a priori assumption of the number of clusters is needed. This is a problem because, in general, this is not known in medical diagnosis. The other limitation is that it involves a greedy local search methodology such that sub-optimal solutions may be returned and, thus, misdiagnosis could occur. Bandyopadhyay [8] has recently proposed a Simulated Annealing Fuzzy Clustering algorithm (SAFC) which can avoid these two limitations. However, when we implemented the proposed algorithm, it was found that sub-optimal solutions could be obtained in certain circumstances. In this paper, we extend the SAFC algorithm to overcome this difficulty and apply this modified version to the classification of seven sets of FTIR spectra data which have been taken from three oral cancer patients. With no prior specification of cluster number, our modified SAFC algorithm is shown to obtain the correct (clinical) classification of clusters in 4 out of 7 data sets. In the remaining 3 data sets it produces a number of clusters which, while differing from the clinical classification, appear to better match the underlying data when subjectively visualised using Principal Component Analysis (PCA).
منابع مشابه
Simulated Annealing Fuzzy Clustering in Cancer Diagnosis
Classification is an important research area in cancer diagnosis. Fuzzy C-means (FCM) is one of the most widely used fuzzy clustering algorithms in real world applications. However there are two major limitations that exist in this method. The first is that a predefined number of clusters must be given in advance. The second is that the FCM technique can get stuck in sub-optimal solutions. In o...
متن کاملAn Approach to Reducing Overfitting in FCM with Evolutionary Optimization
Fuzzy clustering methods are conveniently employed in constructing a fuzzy model of a system, but they need to tune some parameters. In this research, FCM is chosen for fuzzy clustering. Parameters such as the number of clusters and the value of fuzzifier significantly influence the extent of generalization of the fuzzy model. These two parameters require tuning to reduce the overfitting in the...
متن کاملA Fuzzy C-means Algorithm for Clustering Fuzzy Data and Its Application in Clustering Incomplete Data
The fuzzy c-means clustering algorithm is a useful tool for clustering; but it is convenient only for crisp complete data. In this article, an enhancement of the algorithm is proposed which is suitable for clustering trapezoidal fuzzy data. A linear ranking function is used to define a distance for trapezoidal fuzzy data. Then, as an application, a method based on the proposed algorithm is pres...
متن کاملA Clustering Based Location-allocation Problem Considering Transportation Costs and Statistical Properties (RESEARCH NOTE)
Cluster analysis is a useful technique in multivariate statistical analysis. Different types of hierarchical cluster analysis and K-means have been used for data analysis in previous studies. However, the K-means algorithm can be improved using some metaheuristics algorithms. In this study, we propose simulated annealing based algorithm for K-means in the clustering analysis which we refer it a...
متن کاملFuzzy Multi-Period Mathematical Programming Model for Maximal Covering Location Problem
In this paper, a model is presented to locate ambulances, considering backup facility (to increase reliability) and the restriction of ambulance capacity. This model is designed for emergencies. In this model the covered demand for each demand point depends on the number of coverage times and the amount of demand. The demand amount and ambulance coverage radius are considered...
متن کامل